11 research outputs found

    New Frameworks for Offline and Streaming Coreset Constructions

    Full text link
    A coreset for a set of points is a small subset of weighted points that approximately preserves important properties of the original set. Specifically, if PP is a set of points, QQ is a set of queries, and f:P×QRf:P\times Q\to\mathbb{R} is a cost function, then a set SPS\subseteq P with weights w:P[0,)w:P\to[0,\infty) is an ϵ\epsilon-coreset for some parameter ϵ>0\epsilon>0 if sSw(s)f(s,q)\sum_{s\in S}w(s)f(s,q) is a (1+ϵ)(1+\epsilon) multiplicative approximation to pPf(p,q)\sum_{p\in P}f(p,q) for all qQq\in Q. Coresets are used to solve fundamental problems in machine learning under various big data models of computation. Many of the suggested coresets in the recent decade used, or could have used a general framework for constructing coresets whose size depends quadratically on what is known as total sensitivity tt. In this paper we improve this bound from O(t2)O(t^2) to O(tlogt)O(t\log t). Thus our results imply more space efficient solutions to a number of problems, including projective clustering, kk-line clustering, and subspace approximation. Moreover, we generalize the notion of sensitivity sampling for sup-sampling that supports non-multiplicative approximations, negative cost functions and more. The main technical result is a generic reduction to the sample complexity of learning a class of functions with bounded VC dimension. We show that obtaining an (ν,α)(\nu,\alpha)-sample for this class of functions with appropriate parameters ν\nu and α\alpha suffices to achieve space efficient ϵ\epsilon-coresets. Our result implies more efficient coreset constructions for a number of interesting problems in machine learning; we show applications to kk-median/kk-means, kk-line clustering, jj-subspace approximation, and the integer (j,k)(j,k)-projective clustering problem

    Synaptic Size Dynamics as an Effectively Stochastic Process

    No full text
    <div><p>Long-term, repeated measurements of individual synaptic properties have revealed that synapses can undergo significant directed and spontaneous changes over time scales of minutes to weeks. These changes are presumably driven by a large number of activity-dependent and independent molecular processes, yet how these processes integrate to determine the totality of synaptic size remains unknown. Here we propose, as an alternative to detailed, mechanistic descriptions, a statistical approach to synaptic size dynamics. The basic premise of this approach is that the integrated outcome of the myriad of processes that drive synaptic size dynamics are effectively described as a combination of multiplicative and additive processes, both of which are stochastic and taken from distributions parametrically affected by physiological signals. We show that this seemingly simple model, known in probability theory as the Kesten process, can generate rich dynamics which are qualitatively similar to the dynamics of individual glutamatergic synapses recorded in long-term time-lapse experiments in <i>ex-vivo</i> cortical networks. Moreover, we show that this stochastic model, which is insensitive to many of its underlying details, quantitatively captures the distributions of synaptic sizes measured in these experiments, the long-term stability of such distributions and their scaling in response to pharmacological manipulations. Finally, we show that the average kinetics of new postsynaptic density formation measured in such experiments is also faithfully captured by the same model. The model thus provides a useful framework for characterizing synapse size dynamics at steady state, during initial formation of such steady states, and during their convergence to new steady states following perturbations. These findings show the strength of a simple low dimensional statistical model to quantitatively describe synapse size dynamics as the integrated result of many underlying complex processes.</p></div

    Properties of the Kesten process in estimated parameter regime.

    No full text
    <p><b>(A)</b> Simulated synaptic trajectories of 14 out of 1075 synapses, evolved for 160 hours at 30 min intervals. Synapses were sorted according to initial size and then every 76<sup>th</sup> trajectory was selected for display (compare with <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi-1003846-g001" target="_blank">Fig. 1D</a>). The Kesten process parameters used here were based on the estimate shown in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi-1003846-g004" target="_blank">Fig. 4</a> ( = 0.9923±0.05; 〈<i>η</i>〉 = 0.0077±0.03) and values were obtained from Gaussian distributions with these parameters. The initial data set (1087 synapses) was identical to that shown in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi-1003846-g002" target="_blank">Figs. 2A</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi-1003846-g004" target="_blank">4</a>; 12 synapses were ‘lost’ during the simulation (i.e. their values reduced to 0) and were excluded from subsequent analysis. (<b>B</b>) Synaptic distributions along time, starting from a measured distribution (thick black line) and applying the time evolution of the Kesten process to this initial population. Four subsequent time points are plotted as indicated. Inset shows the same distributions on a semi-logarithmic scale. (<b>C,D</b>) Examples of k-times iterated mappings corresponding to 24 and 48 time-steps (compare with <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi-1003846-g004" target="_blank">Fig. 4C,D</a>). (<b>E</b>) Slope of k-times iterated mappings as a function of k in simulated trajectories (circles) and in a theoretical prediction based on <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi.1003846.e003" target="_blank">Eq. (3)</a> (red solid line, red equation). (<b>F</b>) Scatter plot of changes in synapse size as a function of initial size for simulated trajectories for the period covering first 24 hours of the simulation. Note the strong resemblance with the experimental measurements of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi-1003846-g002" target="_blank">Fig. 2A</a>.</p

    nar

    No full text
    ne'erWe didn't take nar copper from him.YesDNE-cit WK 63Used I and SupUsed I and SupUsed

    Estimating Kesten parameters in experimental data.

    No full text
    <p>An estimate of the parameter can be obtained from k-times iterated mappings of the data as explained in text. These mappings are shown for 1, 8, 24 and 48 time-steps, corresponding to 0.5, 4, 12 and 24 hours respectively (<b>A–D</b>); from each such mapping the slope of the linear regression (solid black line) is extracted. (<b>E</b>) The logarithmic values of these slopes (circles) plotted as a function of iteration number and fit by linear regression (solid black line) to obtain an estimate of . (<b>F</b>) The measured slopes (circles) with the predicted slope values (red line) over an extended time scale.</p

    Distribution rescaling with individual rank-order shuffling in the Kesten process.

    No full text
    <p>The Kesten process provides a simple mechanism for population distribution rescaling without individual multiplication by a constant factor. Simulations were performed for 127 synapses (initial values taken from the synapses of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi-1003846-g007" target="_blank">Fig. 7</a>). The synapses were first evolved for 24 hours (48 time points) with a Kesten process that preserved the original distribution. At this point was slightly increased (from 0.992 to 0.995), and the trajectories were evolved for another 24 hours with the new parameters. (<b>A</b>) Distributions before (blue) and after (red) changing . (<b>B</b>) Same distributions shown in (A) after scaling. (<b>C</b>) Changes in the fluorescence of individual synapses (<i>ΔF</i>) during the first 24 hours after changing (averages and standard deviations of binned data). The green line represents the expected relationships between <i>ΔF</i> and <i>F</i> had sizes of individual synapses scaled through multiplication by 1.14 (the ratio of mean synaptic size before and after changing . (<b>D</b>) Scaling without preserving rank order. Synapses were sorted according to their size before changing and plotted according to their original sizes (blue dots). The ‘sizes’ of the same synapses 24 hours after changing are shown as red dots. As in the experiments of <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi-1003846-g007" target="_blank">Figs. 7</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi-1003846-g008" target="_blank">8</a>, rank order is not preserved. The expected synaptic ‘sizes’, had scaling occurred multiplicatively, are shown as green dots.</p

    Changes in the fluorescence of individual synapses as a function of their initial fluorescence.

    No full text
    <p>Each dot represents one synapse. <i>ΔF</i> represents the change in fluorescence after a given time interval. Data were normalized by dividing the fluorescence of each synapse by the average fluorescence of all synapses at time t = 0 to allow pooling of data from multiple neurons irrespective of some variability in neuron-to-neuron expression levels. Solid lines are linear fits; vertical dashed lines highlight the average synaptic size ( = 1, after normalization). All data was obtained under baseline conditions from unperturbed networks. (<b>A</b>) Rat cortical neurons expressing PSD-95:EGFP; 1087 synapses from 10 neurons in 5 separate experiments. Images were collected at 30 min intervals; ΔF was measured after a 24 hour interval (see ref <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi.1003846-Kaufman1" target="_blank">[20]</a> for further details). (<b>B</b>) Mouse cortical neurons expressing PSD-95:mTurquoise; 554 synapses from 8 neurons in 6 separate experiments. Images were collected at 25 min intervals; <i>ΔF</i> was measured after a 15 hour interval (see ref <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi.1003846-FisherLavie2" target="_blank">[63]</a> for further details). (<b>C</b>) Mouse cortical neurons expressing munc13-1:EYFP; 554 synapses from 8 neurons in 6 separate experiments. Imaging was performed as in B (see ref <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi.1003846-FisherLavie2" target="_blank">[63]</a> for further details). (<b>D</b>) Rat cortical neurons expressing mTurquoise2:Gephyrin; 749 synapses from 27 neurons in 4 experiments. Images were collected at 60 min intervals; ΔF was measured after a 24 hour interval (Anna Rubinski and Noam E. Ziv, unpublished data).</p

    Kinetics of formation of new postsynaptic densities.

    No full text
    <p><b>A,B</b>) The formation of a new PSD. Left panel: low magnification image of a dendrite 68 hours after the beginning of a time lapse session (started at 21 days <i>in vitro</i>). Right panels: gradual accumulation of PSD-95:EGFP at a new site (blue arrowhead). Bar: 10 µm. <b>C</b>) Time course of PSD-95:EGFP accumulation at the new site shown in <b>A</b>. The blue dots indicate the time-points of the images shown in B. <b>D</b>) Mean time course of new PSD formation in mature (>21 days in vitro) networks (average ± SEM). Data, pooled from 4 neurons, was aligned to the first time point at which a new PSD was observed. The fluorescence of each new synapse was normalized by subtracting the fluorescence value measured at its future location before a PSD was first detectable, and then divided by the background corrected mean fluorescence of the preexisting PSDs of that neuron. The number of new PSDs used to calculate the data points is shown as an orange line. <b>E</b>) Two simulated trajectories of new synapses, seeded with an initial value of 0.05 and evolved as a Kesten process with parameters  = 0.962±0.06 and 〈<i>η</i>〉 = 0.038±0.03 (Gaussian distributions). The resulting trajectories were normalized as the experimental data shown in D. <b>F</b>) Mean time course of synapse formation calculated analytically by <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi.1003846.e020" target="_blank">equation (5)</a> (green) and averaged over 200 simulated Kesten trajectories (red, average ± SEM) evolved and normalized as described in E. Open circles represent the experimentally measured data shown in D. <b>G</b>) Synapse formation in developing networks: mean time course of new PSD formation in developing networks (10–13 days <i>in vitro</i>; average ± SEM). Data, pooled from 3 neurons, was normalized as in D. The number of PSDs used to calculate the data points is shown as an orange line. <b>H</b>) Mean time course of synapse formation in developing networks calculated analytically based on <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003846#pcbi.1003846.e020" target="_blank">equation (5)</a> (green) and averaged over 79 simulated Kesten trajectories (red, average ± SEM). The parameters used for these simulations and calculations were 〈<i>ε</i>〉 = 0.74±0.06 and 〈<i>η</i>〉 = 0.26±0.03 (Gaussian distributions; 〈<i>η</i>〉 was constrained by as explained in main text). Note that these reflect values for 10 minute steps (equivalent to  = 0.405 for half hour steps). Open circles represent the experimentally measured data shown in G.</p

    Invariance of Kesten limiting distribution shape to different ε- and η- distributions.

    No full text
    <p>(<b>A</b>) Simulated limiting distributions of Kesten processes with the three different ε-distributions shown in inset, all belonging to the same μ-class 6, that is, 〈<i>ε</i><sup>6</sup>〉 = 1. The distribution of η was held fixed. The same three distributions after scaling are shown on the right. (<b>B</b>) Simulated limiting distributions of Kesten processes with the three different η-distributions shown in the inset. The distribution of <i>ε</i> was held fixed. The same three distributions after scaling are shown on the right.</p
    corecore